Add default target_modules for nemotron_h hybrid Mamba-MoE models#3289
Open
A1c0r-Z wants to merge 1 commit into
Open
Add default target_modules for nemotron_h hybrid Mamba-MoE models#3289A1c0r-Z wants to merge 1 commit into
A1c0r-Z wants to merge 1 commit into
Conversation
Registers "nemotron_h" in TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING (and ADALORA / VBLORA / WAVEFT) with defaults [q_proj, k_proj, v_proj, o_proj], so users can call LoraConfig() on Nemotron-3 without specifying target_modules. Defaults intentionally exclude the Mamba mixer's in_proj / out_proj / conv1d, which are blocked by the compatibility check added in huggingface#2562. nemotron_h is also added to mamba_model_types so that check applies. Adds two unit tests under TestDefaultTargetModules: - verifies the constants for all 4 mappings - verifies the Mamba check fires for nemotron_h + out_proj Follows the gemma4 precedent set in huggingface#3136.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add default target_modules for Nemotron-H
What this does
Registers default
target_modulesfor thenemotron_hmodel type so thatLoraConfig()(and other PEFT methods) can be used on NVIDIA's Nemotron-3 hybrid Mamba + MoE + Attention models without users having to specifytarget_modulesmanually.Defaults target the attention projections only (
q_proj,k_proj,v_proj,o_proj). The Mamba mixer'sin_proj/out_proj/conv1dare intentionally excluded because they belong to the Mamba block and are forbidden by the Mamba-architecture compatibility check added in #2562.nemotron_his also added tomamba_model_typesso the compatibility check applies to it.Why
Without these defaults, calling
LoraConfig()on a Nemotron-H model raisesPlease specify target_modules. NVIDIA's own NeMo Automodel LoRA cookbook for Nemotron-3 works around this by passingexclude_modules=["*.out_proj"]and bypasses PEFT for the inner LoRA application — extending PEFT to handle nemotron_h natively fixes the friction upstream.What changed
src/peft/utils/constants.py— adds"nemotron_h"to four mappings, following the precedent established in #3136 (gemma4):TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPINGTRANSFORMERS_MODELS_TO_ADALORA_TARGET_MODULES_MAPPINGTRANSFORMERS_MODELS_TO_VBLORA_TARGET_MODULES_MAPPINGTRANSFORMERS_MODELS_TO_WAVEFT_TARGET_MODULES_MAPPINGsrc/peft/tuners/tuners_utils.py— adds"nemotron_h"tomamba_model_typesin_check_lora_target_modules_mambaso the Mamba forbidden-module check (out_proj,conv1d) applies. This protects users who might explicitly try to target those names without realizing they belong to the Mamba mixer.tests/test_custom_models.py— adds two tests underTestDefaultTargetModules:test_default_target_modules_nemotron_h: verifies the constant lookup for all four mappings and asserts the defaults do not include forbidden Mamba modules.test_nemotron_h_blocks_mamba_modules: verifies the Mamba compatibility check raises when a user explicitly targetsout_projon a model whosemodel_typeis"nemotron_h".Verified
Local syntax check passes (
ast.parseon all three files). Test suite run pending environment install.Related